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In the Natural Immune Systems NIS, adaptive and emergent behaviors result 
from the behaviors of each cell and their interactions with other cells 
and environment. Modeling and Simulating NIS requires aggregating these 
cognitive interactions between the individual cells and the environment. In 
last years the Fuzzy Cognitive Maps (FCM) has been shown to be 
a convenient tool for modeling, controlling and simulating complex systems. 
In this paper, a new type of learning fuzzy cognitive maps (LFCM) have 
been proposed as an extension of traditional FCM for modeling complex 
adaptive system is described. Our approach is summarized in two major 
ideas: The first one is to increase the reinforcement learning capabilities of 
the FCM by using an adaptation of Q-learning technique and the second one 


Immune System Response 
Reinforcement Learning 


is to foster diversity of concept's states within the FCM by adopting an 
IF-THEN rule based system. Through modeling and simulating response of 
natural immune system, we show the effectiveness of the proposed approach 
in modeling CASs. 


Copyright © 2016 Institute of Advanced Engineering and Science. 
All rights reserved. 


Corresponding Author: 


Ahmed TIili, 

Department of Computer Science and its Applications, 

New Faculty of Information Technologies and Communication Abdelhamid Mehri University, 
Constantine, Algeria. 

Email: agent25000@ yahoo.com 


1. INTRODUCTION 

A Complex Adaptive System (CAS) [21] is defined as a collection of entities (agents), with simple 
rules of behavior, merged in a dynamic and unknown environment and able to adapt to it by learning 
experiences. The overall adaptation to the environment appears through the local behavior of entities that is 
adaptive. 

Found in nature, many biological and social systems are similar to the CAS: the immune system, 
bird flocks, the cell, insect colonies, brain, economic markets etc.... All these systems are characterized by 
their two key concepts, namely the emergence of global behavior, which is due to of the lack of centralized 
control and measuring self-organization adaptation to the environment by relative learning. 

In the Natural Immune Systems (NIS), emergent behaviors result from the behaviors of each 
individual cell and their interactions with the environment. Modeling NIS requires incorporating these 
adaptive interactions among the individual cells and the environment. Modeling approaches for NIS are 
grouped into two categories: mathematical models generally take the form of partial differential equations, 
and cell-based models simulate each individual cell behavior and interactions between them enabling the 
observation of the emergent behavior. This study focuses on the cell-based models of NIS, and mainly, the 
technical aspect of the fuzzy rule-based simulation method for NIS is described. How to implement the cell 
behaviors and the interactions with the environment into the computational domain is discussed. The system 
behaviors described in this paper are differentiations mechanism between self and no-self cells. 
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Lastly to get a better understanding why NIS is considered as complex adaptive system the 
following points may be relevant: 


1. NIS is a decentralized system. 

2. NIS mechanism is a cognitive task. 

3. There are similarities between NIS and adaptive social insects colonies, e.g. ants and bees. 

4. The overall behavior adaptation of NIS appears through the local behavior adaptation of each cell. 


In the littérature, agent-based models (ABM) and cellular automata (CA) are two of the commonly 
used methodologies for modeling of NIS [16, 17]. The ABM are criticized for their complexity, by against 
the CAs are also criticized for lack of environment. 

Recently, many studies have using FCMs and their multiple extensions [1-2,19], to model complex 
systems where CASs are a special case, and have given encouraging results [3-5]. In this paper we present an 
approach for modeling CASs based on one connection of the fuzzy cognitive maps (FCM) theory and 
adpative reinforcement learning algorithm Q-Learning, also linear weight adaptation method based on 
hebbian learning algorithm [7] developed for neural networks is used to train FCM by updates only the initial 
weights of FCM. 


2. RESEARCH METHOD 

The Natural Immune System (NIS) is one of the most advanced and complex adaptive biological 
systems. In order to maintain independence and help in avoiding autoimmunity, a living organism must 
prevent invasion by numerous microorganisms and harmful substances from the environment, and must 
handle those that do enter. NIS has a double objective since it has to maximize harmful antigens elimination 
and at the same time minimize harm to self (autoimmunity). Immunity includes functions to distinguish 
between self and no self components (cells), and to remove the latter. The immune response process is 
depicted in Figure 5. 
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Figure 1. Natural immune response mechanisms 


Immune responses are implemented by special cells called lymphocytes. These cells are furnished 
with a sort of antennas to recognize no self chemical structures (antigen determinants) not included among 
the self components. 

B cells and T cells are two major types of lymphocytes and are derived from hematopoietic stem 
cells in the bone marrow. B cells are involved in the humoral immune response. They work chiefly by 
secreting substances called antibodies into the body’s fluids. Moreover, T cells are involved in cell-mediated 
immune response and can be subdivided into two groups: the helper T cell (Th) and the suppressor T cell 
(Ts). Activation of T cells by antigen-presenting cells (APCs) in lymph nodes is a key starting event in many 
natural immune responses. Th cells are particularly important in the immune system. Because the Th cells 
can not recognize antigen directly, the antigen have to be processed by some other accessory cells (Antigen- 
Presenting Cells (APC)) in advance. The activation of Th cells depends on the interaction of T cell receptors 
(TCRs), which are molecules found on the surface of T cells that are generally responsible for recognizing 
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peptides bound to Major Histocompatibility Complex (MHC) molecules, with peptides that are displayed on 
the surface of antigen-presenting cells. T cell receptors scan the surface of antigen presenting cells for 
specific peptides bound to molecules of the MHC. If the specific peptides are found, the Th cell is activated, 
and secretes interleukin (IL+) and other various chemical signals. The secreted interleukin plays an important 
role in activating B cells. On the other hand, the suppressor T cell (Ts) can secrete suppressing signal 
interleukin (IL-) to inhibit the action of immune response. 
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Figure 2. Our Complex adaptive artificial immune system (CAAIS) modeled in the background of the 
LFCM. Motor concept Th, after activation process, has two possible actions SELF and NO-SELF action as 
response to environment. 


Proposed framework for modeling and simulating natural immune system response as complex 
adaptive system is summarized in Figure 3. The main idea is a connection between the fuzzy cognitive maps 
and reinforcement learning. The first step is to model the system in the background of the proposed approach, 
i.e. the FCM. Then we describe the reflectors concepts, intermediates concepts and motors concepts, and in 
the last point of the first step, we determine and describe the relation or link between concepts with their 
values weight. In the second step, the system is capable of learning from experience with the environment 
using a reward-punishment procedure, called reinforcement learning based adaptive Q-learning algorithm 
(Figure 2). 
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Figure3. Framework of the proposed approach 


Reinforcement learning is called on-line method in machine learning theory and the interaction 
between the learning NIS modeled and with environment, Figure 2, is its main source of intelligence. A most 
used RL algorithm is Q-Learning [9] in machine learning, works by learning an action-state value function 
that expresses cumulative reward of taking a given action in a given state. 
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Figure 4. The reinforcement learning model. 


2.1. Fuzzy Cognitive Maps 

The term cognitive map (CM) appears for the first time in 1948's in article by E. Tolman [10] 
cognitive maps in rats and men to describe the abstract mental representation of space built by rats trained to 
navigate in the labyrinth. The term FCM (Fuzzy Cognitive Map) was introduced in 1986 by B. Kosko [2], to 
describe a simple extension of CMs by the combination of fuzzy logic and artificial neural networks. This 
robust combination given FCMs a structure similar to artificial recurrent neural networks (Artificial 
Recurrent Neural Network ARNN. FCMs (Figure 3) can describe the complex behavior of entities. They are 
represented as directed graphs whose nodes are concepts (classified into three types: sensory, motor and 
effectors) and the arcs represent causal relationships between these concepts. Each arc from one concept C; 
to one concept C; is associated with a weight w;j reflecting a relationship of inhibition (w;j <0) or excitation 
(w; > 0). Each concept is associated with a degree of activation, represent's the state at time t, and can be 
modified over time. The dynamics of an FCM can be summarized in one cycle (from t to t +1) by updating 
the activations vector. 


Figure 5. An FCM as a graph 


The following gives a formal description of an FCM [6]. K denotes one of the rings Æ or JR, by & 
one of the numbers 0 or 1, for V one of the sets {0, 1}, {-1.0, 1}, or [-8,a]. Let (n, to) € IN? and k € JR*". 
An FCM F is a sixfold (C, K, W, A, fa R): 

a. C={C,...,C,} is the set of n concepts forming the nodes of a graph. 
b. K C Cx Cis the set of arcs (C, C;) from C; to C;. 
c. W: CxC>K 
(C, Cj) — Wjis a function of CxC to IR associating a weight W; to a pair of concepts (C; C;), 
with W; = 0 if (C, C;) €A, or W;j equal to the weight of the edge if (C, C;) & A. Note that W(C x C) = (W;) 
E€ K"*" is a matrix of M, (IR). 


a. A:C>V" 
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Ci —a; is a function that maps each concept C; to the sequence of its activation degree at the 
momentt € IN, a; (t) © Vis its degree of activation at the moment t. We Note a (t) = [(a; (t)) i eiti, nil T the 
vector of activations at the moment t. 


b. fı © UR") Nisa sequence of vectors of forced activations such as fori €/1, n] and t> fg is the 
forced activation of the concept C; at moment t. 


c. (R) is a recurrence relationship on t > tọ between a; (t + 1), a(t) and fai (4) for 1 =[1,n] indicating the 
dynamics of the map F. 


(R): vie[l, n], Yt to, 


{ ai (to) =0 
a; (t+1) = olgi(f a(t), ÈŻjetım Wi aj(t))] 


Reri mth Exrodul rote 


a | ue 


Figure 6. Cognitive maps standardizing function. 


The Mode represented by the function / is to reduce the value of concepts within the range of 
values taken as the area and can be either binary, ternary and sigmoid. The value of each concept is 
calculated with original formula proposed by Kosko [2]: 


A&tD = f(z A) 


Other alternatives are to take into account the past history of concepts and jointly proposed the 
following equation: 


AS) = fC A+ A W) 


The Algorithm 1 shows the steps to follow for the calculation of the next input vector at each 
iteration. 
Algorithm 1: Calculation of the output vector 
Step 1: Read the input vector A“ and weight matrix W. 
Step 2: Calculate the output vector AS*? : AS" = f(a A™. W) 
Step 3: Apply the transfer function f to the output vector A“*) 
Step 4: verify the conditions of termination of the algorithm 


2.2. Basics Reinforcement Learning (RL) 
The Markov Decision Processes (MDP) defines the formal framework of reinforcement 
learning [13]. More formally, an MDP process is defined by: 
a. S, a finite set of states. s € S 


b. A, afinite set of actions in state s. a € A(s) 
c. r,areward function. r(s, a) € R 


d. P, the probability of transition from one state to another depending on the selected action. 
P (s '| s, a) = P4(s, s^. 
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The problem is to find an optimal policy of actions that achieves the goal by maximizing the 
rewards, starting from any initial state. At each iteration, the agent being in the state chooses an action, 
according to these outputs the environment sends either award or a penalty to the agent shown by the 
following formula: rg = h (Sx, âk, Sk+1)- 

To find the total cost, which is represented by the formula È h(sņķ,ak ‚Sķ+1), the costs are accumulated 
at each iteration of the system. In [8] the expected reward is weighted by the parameter y and becomes È y h 
(Si,ai,Si+1) With O < y < 1. The RL is to find a policy or an optimal strategy n*, among the different 2 possible 
strategies in the selection of the action. Considering that an optimal policy m exists, and then the Bellman [11] 
optimality equation is satisfied: 


V™ = V(s)) = max {R(s;a) + &(EP(s5i41,a) V (Sixt)} VSES 


Equation (2) sets the value function of the optimal policy that reinforcement learning will seek to 
assess: 


V (s) = max V” (s) 


In Q-Learning algorithm technique [13], the agent, For any policy z and any state s € S, the value of 
taking action a in state s under policy z, denoted Q"(s,a), is the expected discounted future reward starting in 
s, taking a, and henceforth following z. In this case the function (3) can also be expressed for a state-action 
pair: 


Q (s,a) = max O"(s,a) 


Q-learning is one of the most popular reinforcement learning methods developed by Watkins (1989) 
and is based on TD(0). It involves finding state-action qualities rather than just state values. Q-Learning 
algorithm technique is to introduce a quality function Q represents a value for each state-action pair and Q” 
(s, a) is to strengthen estimate when starting from state s, executing action a by following a policy m: Q"(s, 
a) = E Lyr; and Q*(s, a) is the optimal state-action pair by following policy 1* if Q*(s, a) = max Q’(s, a) and 
if we reach the Q*(s;, ai) for each pair state-action then we say that the agent can reach the goal starting from 
any initial state. The value of Q is updated by the following equation: 


Q's „ai) = Q* (si ,ai) + afh(s;,a; ‚Si+1) + y arg max(Q* (si+1,a)) — Q* (s;,a))] 


2.3. The adaptation of Learning Fuzzy Cognitive Maps 

The rationale of the proposed immune response inspired LFCM is to foster learning capability and 
memory acquisition of the LFCM. To show how these two issues have been addressed, the Complex adaptive 
artificial Immune system has been considered and modeled in the background of presenting LFCM [15]. In 
immune response the ability to memorize most previously encountered antigens by B cells, enables it to 
mount a more effective reaction in any future encounters. This mechanism in the natural immune system is 
usually designed as the ability of adaptive learning and immune memory acquisition. This is the basis of 
mathematical adaptation of the Q-Learning algorithm in the sense of instructing the agent to consider 
optimally its history, ie the value of Q to aim to memorize the state visited by the agent. in others words, once 
the B cell identifies the interleukin substance from the Th cell concept, it divides into antibody synthetic 
cells, and finaly secretes the antibody (Ab). 

The CASs are distinguished from other systems by their dynamic improvements in current policy 
for each interaction with the environment. So this is a local construction that does not require an assessment 
of the overall strategy. This observation leads us to overlook the value of the quality function Q in step (i+1). 
This translates mathematically by: Q"(si,;, a) = 0 and therefore equation (6) of the function Q becomes as 
follows: 


O**" (s5 ai) = Q" (s, a) + a [r; - OF (s, ai)] 


The value of Q enable system to mount a more effective action in any future encountered state 
already visited. So the Q value is designed to instruct the agent to consider optimally its historical past. If the 
agent is in a state already visited, with a Q value in the table of values, it will be directly exploited to move to 
the next state, otherwise it will explore the possible actions in this state according to their respective 
probabilities The following pseudo code provides an update of the value of Q function: 
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Ifr=1// Award 


O(s, ai) = ON(s; a) + af1- O'(s;,a;)] 


Ifr =0// Penalty 


O's; ai) = (1 - a) O(S;, ai) 


In our approach, if the states are represented after fuzzyfication by the concepts inputs or sensory 
concepts, the output vector is represented by the set of output concepts or effectors concepts that represent 
actions to perform in the environment after defuzzyfication. The motors concepts are the decision-making 
mechanism. The exploration of the actions is accompanied by an update of their probabilities according to 
the linear scheme [9]: 


Ifr=1// Award 

PH! (si, ai) = PX(s;, ai) + B (1 - PX(s;, a) 
Ifr=0// Penalty 

P**1(si, ai) = (1-B) P*(si, ai) 


2.4 Operational mechanism model 

The mechanism to identify the nature of the antigen and the selecting of action to consider is 
summarized by the fuzzy rule based system. A set of IF-THEN linguistic rules, with the inputs and the 
outputs are composed of fuzzy statements, is the essential part of the fuzzy rule: 

IF a set of conditions are satisfied 

THEN a set of results can be inferred 

In this proposed approach, the weights, w;, are dynamic and can be modified according to 
reinforcement learning algorithm to permit the network to be trained by experience [20]. Based on the 
theoretical aspects described above, the pseudo code of Algorithm 2 summarizes our approach. 
Algorithm 2: Pseudo code of the proposed approach 
Step 1: Read the vector A% and weight matrix W 
Step 2: Calculate the output vector A” : AWV = HAV + 5 AV. W) 
Step 3: Apply the transfer function f to the output vector A4*” 
Step 4: Among the active concepts choose the one that has the highest value of the function Q, if not 
probability 
Step 5: calculate the new output vector (output concepts) A&’” 
Step 6: Depending on the response to the environment: 


Ifr=1// Award 

(Updating the probability P; and the Q value) 

Q1! (s, a) = Q*(s, a) + a [1 - O(s; ai)] 

wW! (C,C;) = WCC) 

P™'(a;) = PX(a;) + B [1 - P*(a;)] 

If r = o / / Penalty 

(Updating the probability P;j, the weight of the connection and the value of Q) 
Q" (s, ai)) = (1- a) OF (S; ai) 

W'"(C,C)) = WCC) +n [1 - W(C.C)] 

P*'(a;) = (1-B) P*(ai) 


Step 7: If the termination conditions are realized Stop. Otherwise go to Step 2. 


3. RESULTS AND ANALYSIS 

To evaluate the performance of our proposed approach, the simulation of the system was 
implemented in MATLAB, which comprises Fuzzification and _ defuzzification with FCM 
modeling [22]. Table 2 shows weight values between concepts after deffuzification process in the bimodal 


Natural Immune System Response As Complexe Adaptive System Using Learning Fuzzy ...(Ahmed TIili) 


102 o ISSN: 2252-8938 


mode. The main purpose of the immune system is to recognize all cells within the body and categorize those 
cells as self or non-self. Activation of T cells by antigen-presenting cells (APCs), with the accessory concept 
MEC, in lymph nodes is a key initiating event in natural immune responses. In this case the Th concept 
(Th cells) is considered as the motor concept and the all others concepts are considered as accessory 
concepts. T cells are able alone to differentiate between self and no self (antigens) cells. T cell receptors scan 
the surface of APC for specific peptides bound to molecules of the MHC. If the specific peptides are found, 
the Th cell is activated, so the no-self action is executed and the antibody will be secreted, otherwise the 
antigen is recognized as self cell then the self action is executed and the immune response is terminated. 


Table 1. Concept Description of The CAAIS in LFCM Background 


Concepts Description 
Ag Antigens (virus and bacteria) 
APC Antigen Presenting Cells 
MHC Major Histocompatibility Complex 
molecule 
Th The Helper T cell 
IL+ The interleukin positive signal 
secreted by Th cell 
B B Cell 
Ab Antibody produced by B cells 
Ts The suppressor T cell 


The interleukin negative signal 
secreted by Ts cell 


The number of concepts has been reduced to 9 concepts thus to avoid the complexity of the CAAIS 
modeled in this LFCM type and for the proposed technique to be more clear to no specialist readers we use 
fuzzyfied binary mode. Concepts Ag and Ab are the Factor-concepts (sensory concepts and effectors 
concepts respectively), which represent the input and output concept (in term of interaction with the 
environment). 


Table 2. Weight Values Between Concepts in the Bimodal Mode 


Concepts Ag APC MHC Th IL+ B Ab Ts IL- 
Ag 0 +1 0 0 0 +1 0 0 0 
APC 0 0 +1 0 0 0 0 0 0 
MHC 0 0 0 +1 0 0 0 0 0 
Th -1 0 0 0 +1 0 0 0 0 
IL+ 0 0 0 0 0 +1 0 0 0 
B 0 0 0 0 0 Oo +1 0 0 
Ab -1 0 0 0 0 0 0 +1 0 
Ts 0 0 0 0 0 0 0 0 +1 
IL- 0 0 0 -1 0 0 0 0 0 


The W matrix link associated to this model can be written as follows: 


ooo 
+ 
= 


z 
1i 


i 
= 
(E m A a B ma E oa E a E 


oooooooto 
Loooocetose 
coo0c00foc00 
ooootooos 
oootooo0oo0oo0o 
ootoooo0oo0oo0o 
otŁtooooo0oo0oo0o 


oo 


The FCM (Figure 2) has twelve edges and nine concepts with links excitatory (+1) of 'Ag' to 
'APC','APC' to 'MHC', 'MHC' to 'Th’, 'Th' to 'IL+', 'IL+' to 'B', 'B' to 'Ab', 'Ab' to 'Ts' and 'Ts' to 'IL-' , and 
linked inhibitor (-1) of 'Ab' to 'Ag', 'IL-' to 'Th' and 'Th' to 'Ag'. 

The concept is active if its value is equal to 1, otherwise it is inactive (binary mode). It is given an 
initial activation vector A = (10000000 0). Table I show’s the values P(a;) of the probabilities of actions 
and values of the function Q updated at each iteration. Table II gives the output vector for all iterations in 
response to the environment. 
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Table 3. Action Probabilities and Q-Function Values 
aj P(ai) Q(si, ai) Value 
self 0.5 (Ag; self) 0 
no-self 0.5 (Agi,no-self) 0 


Table 4. Vector Output at Each Iterations 


Inputs vector Output vector Iteration 
(100000000) 110000000 1 
111000000 2 

111100000 3 

111110000 4 


At iteration n° 4 the immune system is facing a situation where it has two possible actions, if the 
state is not encountred, represented by the active concepts Th, the execution of the action SELF leads to the 
deactivation of Ag concept and execution of the action NO-SELF leads to the activation of the B concept 
(through IL+ concept) aims to neutralize the antigen but must choose one among them and this choice is 
guided either by the value of function Q, if the state is already visited or antigen already encountered, or by 
the value of the probability of the action if the first pass in this state. This mechanism to identify the nature of 
the antigen and the selecting action to consider is summarized by the following three fuzzy rule based 
system: 


R; : if Ag is Q(Ag,SELF) then Th is SELF-action 
R; : if Ag is Q(Ag, NO-SELF) then Th is NO-SELF-action. 


R; : if Ag is not Q(Ag, SELF) and Ag is not Q(Ag, no-self) then Th is action to perform selected according to 
the probability. 


In this system based on rules the conditions of the first two rules R; and R, result that system has 
met the antigen before and classify it as a part of the self or no-self by the update table of Q values, for 
moreover the R; rule requires the Th concept to exploit space of possible actions according to their respective 
probabilities. 


4. CONCLUSION 

The soft computing technique of fuzzy cognitive maps for modeling and simulating complex 
adaptive system has been discussed in this paper. A new connection between fuzzy system and reinforcement 
learning has been proposed for analyzing natural immune system response. In the artificial intelligence field 
the natural immune system NIS is argued that it is a complex adaptive system. Global emergent behaviors 
can be observed by applying local rules to individual cells as described by Holland in complex adaptive 
system theory. The complexity and criticism raised by the community in the area of modeling CASs by 
ABMs and CAs, led us to seek another approach, which is contained in same concepts inspired by the area of 
life. In psychology behavior is generally related to the concepts of emotions, perceptions and sensations. 
These key concepts of life can be supported by FCMs. CASs are therefore in the field of artificial life more 
than other areas of computing. The area of FCMs, despite the improvement made by different research teams 
in the world, remains an area dense, low-unified. 
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